1
کامپیوتر و شبکه::
شبکۀ عصبی پیچشی، شبکه عصبی پیچشی، شبکه عصبی پیچشی
Keywords: Video Captioning , NLP , CNN ,RNN ,TDL Module ,
(2.4) General Overview of an Encoder-Decoder Based Framework Fig (2.5) Classification of Different Deep Learning Methods in Video Captioning Fig(2.6) General overview of CNN - RNN Video Caption Generation Method.
In picture captioning, global characteristics are retrieved from the hidden layers of deep CNN architecture, and the language model is either LSTM [14] or GRU [15].
After that, 3D CNN architectures were developed to capture the spatio-temporal information [18], speeding up the feature extraction process and broadening this study field to consider and work on large-scale multimodal datasets.
The extraction of visual features using a pre-trained Convolutional Neural Network (CNN) or Recurrent Neural Network (RNN)-based model and then feeding that to an RNN-based language model to generate variable-length output vector is a general approach to single sentence video captioning for time-varying inputs.،،
واژگان شبکه مترجمین ایران